home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-08-27 | 91.5 KB | 2,246 lines |
-
- .nr tp 12
- .nr sp 12
- .nr pp 10
- .nr fp 8
- .sz \n(pp
- .vs 18 \" set 1.5 spacing between lines
- .nr $r \n(.vu/\n(.su
- .nr $R \n($r
- .fo ''%''
- .(l C
- .sz \n(sp
- .b
- THE IMPLEMENTATION OF POSTGRES
-
- .sz \n(pp
- .sp 3
- .i
- Michael Stonebraker, Lawrence A. Rowe and Michael Hirohama
- EECS Department
- University of California, Berkeley
- .sp
- .r
- .)l
- .sp 3
- .(f
- This research was sponsored
- by the Defense Advanced Research Projects Agency
- through
- NASA Grant NAG 2-530
- and
- by the Army Research Office through
- Grant DAALO3-87-K-0083.
- .)f
- .uh Abstract
- .pp
- Currently, POSTGRES is about 90,000
- lines of code in C and is being used by
- assorted ``bold and brave''
- early users. The system has been constructed by a team of
- 5 part time students led by a full time chief
- programmer over the last three years. During
- this period, we have made a large number of design and
- implementation choices.
- Moreover, in some areas we would do
- things quite differently if we were to start from
- scratch again. The purpose of this paper is to
- reflect on the design and implementation decisions
- we made and to offer advice to implementors who
- might follow some of our paths.
- In this paper we restrict our attention to
- the DBMS ``backend'' functions. In another paper some of us
- treat PICASSO, the application development environment that is being
- built on
- top of POSTGRES.
- .sh 1 "INTRODUCTION"
- .pp
- Current relational DBMSs are oriented toward efficient support for
- business data processing applications where large numbers of instances
- of fixed format records must be stored and accessed.
- Traditional transaction management and query facilities for this
- application area will be termed
- .b data
- .b management.
- .pp
- To satisfy the broader application community outside
- of business applications,
- DBMSs will have to expand to offer services in two other
- dimensions, namely
- .b object
- .b management
- and
- .b knowledge
- .b management.
- Object management entails efficiently storing and manipulating
- non-traditional data types such as bitmaps, icons, text, and polygons.
- Object management problems abound in CAD and many
- other engineering applications.
- Object-oriented programming languages and data bases provide services
- in this area.
- .pp
- Knowledge management entails
- the ability to store and enforce a collection of
- .b rules
- that are part of the semantics of an application.
- Such rules describe integrity constraints about the application,
- as well as
- allowing the derivation of data that is not directly stored in
- the data base.
- .pp
- We now indicate a simple example which requires
- services in all three dimensions.
- Consider an application that stores and manipulates text and graphics
- to facilitate the layout of
- newspaper copy.
- Such a system will be naturally integrated with subscription
- and classified advertisement data. Billing customers for these
- services will require traditional data management services. In addition,
- this application must store non-traditional objects including text,
- bitmaps (pictures), and icons (the banner across the top of the paper).
- Hence, object management services are required.
- Lastly, there are many rules that control newspaper layout.
- For example, the ad copy for two major department stores can never be
- on facing pages. Support for such rules is
- desirable in this application.
- .pp
- We believe that
- .b most
- real
- world data management problems are
- .b three
- .b dimensional.
- Like the
- newspaper
- application, they will
- require a three dimensional solution.
- The fundamental goal of POSTGRES [STON86, WENS88] is to
- provide support for such three dimensional
- applications. To the best of our knowledge
- it is the first
- three
- dimensional
- data manager.
- However, we expect that most DBMSs will
- follow the lead of POSTGRES into these new dimensions.
- .pp
- To accomplish this objective, object and rule management
- capabilities were added to the services found in a traditional data manager.
- In the next two sections we describe the capabilities provided and comment
- on our implementation decisions.
- Then, in Section 4 we discuss the novel
- .b no-overwrite
- storage manager that we implemented in POSTGRES.
- Other papers have explained
- the
- major
- POSTGRES design decisions in these areas, and we assume that the reader is
- familiar with [ROWE87] on the data model, [STON88] on
- rule management, and [STON87] on storage management. Hence, in these
- three sections
- we stress considerations that led to our design,
- what we liked about the design, and the mistakes that we felt
- we made.
- Where appropriate we make suggestions for future implementors based
- on our experience.
- .pp
- Section 5 of the paper comments on specific issues
- in the implementation of POSTGRES and critiques the choices that we
- made.
- In this section we discuss how we interfaced to the operating system,
- our choice of programming languages and some of our implementation philosophy.
- .pp
- The final section concludes with some performance measurements of
- POSTGRES. Specifically, we report the results of
- some of the queries in the Wisconsin
- benchmark [BITT83].
- .sh 1 "THE POSTGRES DATA MODEL AND QUERY LANGUAGE"
- .sh 2 "Introduction"
- .pp
- Traditional relational DBMSs support a data model consisting
- of a collection of named relations, each attribute of which has
- a specific type. In current commercial systems possible
- types are floating
- point numbers, integers, character strings, and dates. It is commonly
- recognized that this data model is insufficient for non-business
- data processing applications.
- In designing a new data model and query language, we were guided by the
- following three design criteria.
-
- 1) orientation toward data base access from a query language
- .pp
- We expect POSTGRES users to interact with their data bases primarily by using
- the set-oriented query language, POSTQUEL. Hence, inclusion of a query
- language, an optimizer and the corresponding run-time system was
- a primary design goal.
- .pp
- It is also possible to interact with a POSTGRES data base
- by utilizing a navigational interface. Such interfaces were
- popularized by the CODASYL proposals of the 1970's and are
- enjoying a renaissance in recent object-oriented proposals
- such as ORION [BANE87] or O2 [VELE89]. Because POSTGRES
- gives each record a unique
- identifier (OID), it is possible to use the identifier for one record as a data
- item in a second record. Using optionally definable indexes on OIDs, it
- is then possible to navigate from one record to the
- next by running one query per navigation
- step.
- In addition, POSTGRES allows a
- user to define functions (methods) to the DBMS. Such functions
- can intersperce statements in a programming language, query language
- commands, and direct calls to internal POSTGRES interfaces.
- The ability to directly execute
- functions which we call
- .b fast
- .b path
- is provided in POSTGRES and
- allows a user to navigate the data base by executing a sequence of
- functions.
- .pp
- However,
- we do not expect this sort of mechanism to become popular. All
- navigational interfaces have the same disadvantages of CODASYL systems,
- namely the application programmer must construct a query plan for each
- task he wants to accomplish and substantial application maintenance
- is required whenever the schema changes.
-
- 2) Orientation toward multi-lingual access
- .pp
- We could have picked our favorite programming language
- and then tightly coupled POSTGRES to the compiler and run-time
- environment of that language. Such an approach would
- offer
- .b persistence
- for variables in this programming language, as well as a query
- language integrated with the control statements of the language.
- This approach has been followed in
- ODE [AGRA89] and many of the recent commercial start-ups doing object-oriented
- data bases.
- .pp
- Our point of view is that most data bases are accessed
- by programs written in several different languages, and we
- do not see any
- programming language Esperanto on the horizon.
- Therefore, most application
- development organizations are
- .b multi-lingual
- and require access to a data base from different languages. In addition,
- data base application packages that a user might acquire, for example
- to perform statistical or spreadsheet services,
- are often not coded in the language being used for developing applications.
- Again, this results in a multi-lingual environment.
- .pp
- Hence, POSTGRES is programming language
- .b neutral,
- that is, it can be called
- from many different languages.
- Tight integration of POSTGRES to a particular
- language requires compiler extensions and a run time system specific
- to that programming
- language.
- One of us has built an implementation of persistent CLOS (Common LISP Object System)
- on top of POSTGRES.
- Persistent CLOS (or persistent X for any programming language, X)
- is inevitably language specific. The run-time system must map
- the disk representation for language objects, including pointers,
- into the main memory representation expected by the language. Moreover,
- an object cache must be maintained in the program address space,
- or performance will suffer badly. Both tasks are inherently
- language specific.
- .pp
- We expect many language specific interfaces to be built for POSTGRES and
- believe that the query language plus the
- .b fast
- .b path
- interface available
- in POSTGRES offers
- a powerful, convenient abstraction against which to build
- these programming language interfaces.
-
- 3) small number of concepts
- .pp
- We tried to build a data model with as few concepts as possible. The
- relational model succeeded in replacing previous data models in
- part because of its simplicity. We wanted to have as few concepts as possible
- so that users would have minimum complexity to contend with.
- Hence, POSTGRES leverages the following three constructs:
- .(l
- types
- functions
- inheritance
- .)l
- In the next subsection we briefly review the POSTGRES data model. Then, we turn to
- a short description of POSTQUEL and fast path. We conclude the section with a
- discussion of whether POSTGRES is object-oriented followed by a critique of our data model and
- query language.
- .sh 2 "The POSTGRES Data Model"
- .pp
- As mentioned in the previous section POSTGRES leverages
- .b types
- and
- .b functions
- as fundamental constructs. There are three kinds of types
- in POSTGRES and three kinds of functions and we discuss the six possibilities
- in this section.
- .pp
- Some researchers, e.g.
- [STON86b, OSBO86], have argued that one should be able
- to construct new
- .b base
- .b types
- such as
- bits, bitstrings,
- encoded character strings, bitmaps,
- compressed integers, packed decimal numbers, radix 50
- decimal numbers, money, etc.
- Unlike most next generation DBMSs which have a hard-wired collection
- of
- base
- types (typically integers, floats and character strings), POSTGRES
- contains an
- .b abstract
- .b data
- .b type
- facility whereby any user
- can construct
- an arbitrary number of new base types.
- Such types can be added to the system while it is executing
- and require the defining user to
- specify functions to convert instances of the type to and from the character string
- data type. Details of the syntax appear in [WENS88].
- .pp
- The second kind of type available in POSTGRES is a
- .b constructed
- .b type.**
- .(f
- ** In this section the reader can use the words
- .b constructed
- .b type,
- .b relation,
- and
- .b class
- interchangeably. Moreover, the words
- .b record,
- .b instance,
- and
- .b tuple
- are similarly interchangeable.
- This section has been purposely written with
- the chosen notation to illustrate a point about
- object-oriented data bases which is discussed in Section 2.5.
- .)f
- A user can create a new type by constructing a
- .b record
- of
- base types and instances of other constructed types. For example:
- .(l
- create DEPT (dname = c10, floor = integer, floorspace = polygon)
- create EMP (name = c12, dept = DEPT, salary = float)
- .)l
- Here, DEPT is a type constructed from an instance of each of three base types, a character string,
- an integer and a polygon. EMP, on the
- other hand, is fabricated from base types and other constructed types.
- .pp
- A constructed type can optionally
- .b inherit
- data elements from other constructed types. For example, a SALESMAN type
- can be created as follows:
- .(l
- create SALESMAN (quota = float) inherits EMP
- .)l
- In this case, an instance of SALESMAN has a quota and inherits all data elements from EMP, namely name, dept and salary.
- We had the standard discussion about whether to include single
- or multiple inheritance and concluded that a single inheritance scheme would simply
- be too restrictive. As a result POSTGRES allows a constructed type to inherit from an arbitrary
- collection of other constructed types.
- .pp
- When ambiguities arise because an object has multiple
- parents with the same field name, we elected
- to refuse to create the new type. However, we
- isolated the resolution semantics in a single routine,
- which can be easily changed to track multiple inheritance semantics as they
- unfold over time in programming languages.
- .pp
- We now turn to the POSTGRES notion of functions. There are three different
- classes of POSTGRES functions,
- .(l
- normal functions
- operators
- POSTQUEL functions
- .)l
- and we discuss each in turn.
- .pp
- A user can define an arbitrary collection of
- .b normal
- .b functions
- whose operands are base types or constructed types. For example,
- he can define
- a function,
- area,
- which maps an instance of a
- polygon into an instance of a floating point number.
- Such functions are automatically available in the query language
- as illustrated in the following example:
- .(l
- retrieve (DEPT.dname) where area (DEPT.floorspace) > 500
- .)l
- Normal functions can be defined to POSTGRES while the system
- is running and are dynamically loaded
- when required during query execution.
- .pp
- Functions are allowed on constructed types, e.g:
- .(l
- retrieve (EMP.name) where overpaid (EMP)
- .)l
- In this case overpaid has an operand of type EMP and returns a boolean.
- Functions whose operands are constructed types are inherited down the type hierarchy
- in the standard way.
- .pp
- Normal functions are arbitrary procedures written in a general purpose programming
- language (in our case C or LISP). Hence, they have arbitrary semantics and can
- run other POSTQUEL commands
- during execution. Therefore, queries with normal
- functions in the qualification
- cannot be optimized by the POSTGRES query optimizer. For example, the above
- query on overpaid employees will
- result in a sequential
- scan of all employees.
- .pp
- To utilize indexes in processing queries, POSTGRES
- supports a second class of functions, called
- .b operators.
- Operators are functions with one or two operands which use the standard operator
- notation in the query language. For example the following query looks for
- departments whose floor space has a greater area than that of a specific polygon:
- .(l
- retrieve (DEPT.dname) where DEPT.floorspace AGT polygon["(0,0), (1,1), (0,2)"]
- .)l
- The "area greater than" operator AGT is defined by indicating the token to use in the query language
- as well as the function to call to evaluate the operator. Moreover, several
- .b hints
- can also be included in the definition which assist the query optimizer. One of these
- hints is that ALE is the negator of this operator. Therefore, the query optimizer
- can transform the query:
- .(l
- retrieve (DEPT.dname) where not (DEPT.floorspace ALE polygon["(0,0), (1,1), (0,2)"])
- .)l
- which cannot be optimized into the one above which can be.
- In addition, the design of the POSTGRES access methods allows a B+-tree index to
- be constructed for the instances of
- floorspace appearing in DEPT records. This index can support efficient access for the
- .b class
- of operators {ALT, ALE, AE, AGT, AGE}. Information on the access paths
- available to the various operators is recorded in the POSTGRES system
- catalogs.
- .pp
- As pointed out in [STON87b] it is imperative that a user be able
- to construct new access methods to provide efficient access to instances of non-traditional
- base types. For example, suppose a user introduces a new operator "!!" defined on
- polygons that returns true if two polygons overlap. Then, he might ask a query such as:
- .(l
- retrieve (DEPT.dname) where DEPT.floorspace !! polygon["(0,0), (1,1), (0,2)"]
- .)l
- There is no B+-tree or hash access method that will allow this query
- to be rapidly executed. Rather, the query must be supported by some
- multidimensional access method such as R-trees, grid files, K-D-B trees, etc. Hence,
- POSTGRES was designed to allow new access methods to be written by POSTGRES users and then dynamically
- added to the system. Basically, an access method to POSTGRES is a collection
- of 13 normal functions which perform record level
- operations such as fetching the next record in a scan, inserting
- a new record, deleting a specific record, etc.
- All a user need do is define implementations for each of these
- functions and make a collection of entries in the system catalogs.
- .pp
- Operators are only available for operands which are base types because
- access methods traditionally support fast access to specific fields in records.
- It is unclear what an access method for a constructed type should do, and therefore
- POSTGRES does not include this capability.
- .pp
- The third kind of function available in POSTGRES is
- .b POSTQUEL
- .b functions.
- Any collection of commands in the POSTQUEL query language can be packaged together
- and defined as a function. For example, the following function defines
- the overpaid employees:
- .(l
- define function high-pay as retrieve (EMP.all) where EMP.salary > 50000
- .)l
- POSTQUEL functions can also have parameters, for example:
- .(l
- define function ret-sal as retrieve (EMP.salary) where EMP.name = $1
- .)l
- Notice that
- ret-sal has one parameter in the body of the function, the name
- of the person involved.
- Such parameters must be provided at the time the function is called.
- A third example POSTQUEL function is:
- .(l
- define function set-of-DEPT as retrieve (DEPT.all) where DEPT.floor = $.floor
- .)l
- This function has a single parameter "$.floor". It is
- expected to appear in a record and receives the value of its parameter from
- the floor field defined elsewhere in the same record.
- .pp
- Each POSTQUEL function is automatically
- a constructed type. For example, one can define a FLOORS type as follows:
- .(l
- create FLOORS (floor = i2, depts = set-of-DEPT)
- .)l
- This constructed type uses the set-of-DEPT function as a constructed type. In this
- case, each instance of FLOORS has a value for depts which is the
- value of the function set-of-DEPT for that record.
- .pp
- In addition, POSTGRES allows a user to form a constructed type, one or
- more of whose fields has the special type POSTQUEL.
- For example, a user can
- construct the following type:
- .(l
- create PERSON (name = c12, hobbies = POSTQUEL)
- .)l
- In this case, each instance of hobbies contains a different
- POSTQUEL function, and therefore each person
- has a name and a POSTQUEL function that defines his particular hobbies. This
- support for POSTQUEL as a type allows the system to simulate
- non-normalized
- relations as found in NF**2 [DADA86].
- .pp
- POSTQUEL functions can appear in the query language in the same manner as
- normal functions. The following example ensures that Joe has
- the same salary as Sam:
- .(l
- replace EMP (salary = ret-sal("Joe")) where EMP.name = "Sam"
- .)l
- .pp
- In addition, since POSTQUEL functions are a constructed type,
- queries can be executed against POSTQUEL functions just like other
- constructed types. For example,
- the following query can be run on the constructed type, high-pay:
- .(l
- retrieve (high-pay.salary) where high-pay.name = "george"
- .)l
- If a POSTQUEL function contains a single retrieve command, then
- it is very similar to a relational view definition, and this
- capability
- allows retrieval operations to be performed on
- objects which are essentially relational views.
- .pp
- Lastly, every time a user defines a constructed type, a POSTQUEL function is
- automatically defined with the same name. For example, when DEPT is constructed,
- the following function is automatically defined:
- .(l
- define function DEPT as retrieve (DEPT.all) where DEPT.OID = $1
- .)l
- When EMP was defined earlier in this section, it contained
- a field dept which was of type DEPT. In fact, DEPT
- was the above automatically defined POSTQUEL function.
- As a result, instance of a constructed type is available as a type because
- POSTGRES automatically defines a POSTQUEL function for each such type.
- .pp
- POSTQUEL functions are a very powerful notion because they allow
- arbitrary collections of instances of types to be returned as the value of the
- function. Since POSTQUEL functions can reference other POSTQUEL functions,
- arbitrary structures of complex objects can be assembled.
- Lastly, POSTQUEL functions allow collections of commands
- such as the 5 SQL commands that make up
- TP1 [ANON85] to be assembled into a
- single function and stored inside the DBMS.
- Then, one can execute TP1 by executing the single function. This
- approach is preferred
- to having to
- submit the 5 SQL commands in TP1 one by one from an application program.
- Using a POSTQUEL function, one replaces 5 round trips between the application
- and the DBMS with 1, which results in a 25% performance improvement in a
- typical OLTP application.
- .sh 2 "The POSTGRES Query Language"
- .pp
- The previous
- section presented several
- examples of the POSTQUEL language.
- It
- is a set oriented query language that resembles a superset of a relational query language.
- Besides user defined functions and operators which were
- illustrated earlier, the features which have been added to a traditional relational language include:
- .(l
- path expressions
- support for nested queries
- transitive closure
- support for inheritance
- support for time travel
- .)l
- .pp
- Path expressions are included because
- POSTQUEL allows constructed types which contain other
- constructed types to be hierarchically referenced. For example, the EMP type
- defined above contains a field which is an instance of the constructed type, DEPT. Hence, one
- can ask for the names of employees who work on the first floor as follows:
- .(l
- retrieve (EMP.name) where EMP.dept.floor = 1
- .)l
- rather than being forced to do a join, e.g:
- .(l
- retrieve (EMP.name) where EMP.dept = DEPT.OID and DEPT.floor = 1
- .)l
- .pp
- POSTQUEL also allows queries to be nested and has operators
- that have sets of instances as operands. For example, to find the departments
- which occupy an entire floor, one would query:
- .(l
- retrieve (DEPT.dname)
- where DEPT.floor NOTIN {D.floor from D in DEPT where D.dname != DEPT.dname}
- .)l
- In this case, the expression inside the curly braces represents a set of instances and
- NOTIN is an operator which takes a set of instances as its right operand.
- .pp
- The transitive closure operation allows one to explode a parts or ancestor hierarchy. Consider
- for example the constructed type:
- .(l
- parent (older, younger)
- .)l
- One can ask for all the ancestors of John as follows:
- .(l
- retrieve* into answer (parent.older)
- using a in answer
- where parent.younger = "John"
- or parent.younger = a.older
- .)l
- In this case the * after retrieve indicates that the associated query should be run until
- answer fails to grow.
- .pp
- If one wishes to find the names of all employees over 40, one would
- write:
- .(l
- retrieve (E.name) using E in EMP
- where E.age > 40
- .)l
- On the other hand, if one wanted the names of all salesmen or employees
- over 40, the notation is:
- .(l
- retrieve (E.name) using E in EMP*
- where E.age > 40
- .)l
- Here the * after the constructed type EMP indicates
- that the query should be run over EMP and all constructed types
- under EMP in the inheritance hierarchy.
- This use of * allows a user to easily run queries
- over a constructed
- type and all its descendents.
- .pp
- Lastly, POSTGRES supports the notion of
- .b time
- .b travel.
- This feature allows a user to run historical queries. For example to find the salary
- of Sam at time T one would query:
- .(l
- retrieve (EMP.salary)
- using EMP [T]
- where EMP.name = "Sam"
- .)l
- POSTGRES will automatically find the version of Sam's record valid at
- the correct time and get the appropriate salary.
- .pp
- Like relational systems, the result of a POSTQUEL command
- can be added to the data base as a new constructed type. In this case, POSTQUEL follows the
- lead of relational systems by removing duplicate records from the result. The user who is
- interested in retaining
- duplicates can do so by ensuring that the OID field of some instance is included in the
- target list being selected.
- For a full description of POSTQUEL the interested reader should consult [WENS88].
- .sh 2 "Fast Path"
- .pp
- There are three reasons why we chose to implement a
- .b fast
- .b path
- feature.
- First, a user who wishes to interact with a data base by executing a
- sequence of functions to navigate to desired data can use fast path to accomplish
- his objective.
- Second, there are a variety of decision support
- applications in which the
- end user is given a specialized query language. In such environments,
- it is often easier for the application developer to construct a
- parse tree representation for a query rather than an ASCII
- one. Hence, it would be desirable for the application
- designer to be able to directly call the POSTGRES optimizer or
- executor.
- Most DBMSs do not allow direct access to internal system modules.
- .pp
- The third reason is a bit more complex. In the persistent CLOS
- layer of PICASSO, it is necessary for the run time system to assign a unique
- identifier (OID) to every constructed object that is persistent. It is
- undesirable for the system to synchronously insert each object directly
- into a POSTGRES data base and thereby assign a POSTGRES identifier to
- the object. This would result in poor performance in executing a
- persistent CLOS program. Rather, persistent CLOS maintains a cache of
- objects in the address space of the program and only inserts a persistent
- object into this cache synchronously. There are several options
- which control how the cache is written out to the data base at a later time.
- Unfortunately, it is essential that a persistent object be assigned a unique
- identifier at the time it enters the cache, because other objects may
- have to point
- to the newly created object and use its OID to do so.
- .pp
- If persistent CLOS assigns unique identifiers, then there will be a complex
- mapping that must be performed when objects are written out to the data
- base and real POSTGRES unique identifiers are assigned. Alternately,
- persistent CLOS must maintain its own system for unique identifiers,
- independent of the POSTGRES one, an obvious duplication of effort.
- The solution chosen was to allow persistent CLOS to access the POSTGRES
- routine that assigns unique identifiers and allow it to preassign
- N POSTGRES object identifiers which it can subsequently assign to cached
- objects. At a later time, these objects can be written to a POSTGRES
- data base using the preassigned unique identifiers. When the supply of
- identifiers is exhausted, persistent CLOS can request another collection.
- .pp
- In all of these examples, an application program requires direct
- access to a user-defined or internal POSTGRES function, and therefore the POSTGRES
- query
- language has been extended with:
- .(l
- function-name (param-list)
- .)l
- In this case, besides running queries in POSTQUEL, a user can
- ask that any function known to POSTGRES be executed.
- This function can be one
- that a user has previously
- defined as a normal, operator, or POSTQUEL function
- or it can be one that is included in the POSTGRES implementation.
- .pp
- Hence, the user can directly call the parser, the optimizer, the
- executor, the access methods, the buffer manager or the utility routines.
- In addition he can define functions which in turn make calls on POSTGRES
- internals. In this way, he can have considerable
- control over the low level flow of control, much as is
- available
- through a DBMS toolkit such as Exodus [RICH87], but without
- all the effort involved in configuring
- a tailored DBMS from the toolkit.
- Moreover, should the user wish to interact with his data base
- by making a collection of function calls (method invocations), this
- facility allows the possibility. As noted in the introduction, we do not expect this
- interface to be especially popular.
- .pp
- The above capability
- is called
- .b fast
- .b path
- because it provided direct access to specific functions
- without checking the validity of parameters. As such, it is effectively
- a remote procedure call facility and allows a user program to
- call a function in another address space rather than in its own
- address space.
- .sh 2 "Is POSTGRES Object-oriented?"
- .pp
- There have been many next generation data models proposed in the last few years.
- Some are characterized by the
- term "extended relational", others are considered "object-oriented" while
- yet others are termed "nested relational".
- POSTGRES could be accurately described as an object-oriented system because it
- includes unique identity for objects, abstract data types, classes (constructed types),
- methods (functions), and inheritance for both data and functions.
- Others (e.g. [ATKI89]) are suggesting definitions
- for the word "object-oriented", and POSTGRES satisfies
- virtually all of the proposed litmus tests.
- .pp
- On the other hand, POSTGRES could also be considered an extended relational system. As noted in
- a previous footnote, Section
- 2 could have been equally well written with the word "constructed type" and
- "instance" replaced by the words "relation" and "tuple".
- In fact, in previous descriptions of POSTGRES [STON86], this notation
- was employed. Hence, others, e.g. [MAIE89] have characterized POSTGRES as
- an extended relational system.
- .pp
- Lastly, POSTGRES supports the POSTQUEL type, which is
- exactly a nested relational structure.
- Consequently, POSTGRES could be classified as a nested relational system as well.
- .pp
- As a result POSTGRES could be described using any of the three adjectives above. In our opinion
- we can interchangeably use the words
- .b relations,
- .b classes,
- and
- .b constructed
- .b types
- in describing POSTGRES.
- Moreover, we can also interchangeably use the words
- .b function
- and
- .b method.
- Lastly, we can interchangeably use the words
- .b instance,
- .b record,
- and
- .b tuple.
- Hence, POSTGRES seems to be either object-oriented or not
- object-oriented, depending on the choice of a few tokens in the parser. As a
- result, we feel that most
- of the efforts to classify the extended data models in next generation data base systems
- are silly exercises in surface syntax.
- .pp
- In the remainder of this section, we comment briefly on the POSTGRES implementation of
- OIDs and inheritance.
- POSTGRES gives each record a unique identifier (OID), and then allows the
- application designer to decide for each constructed type whether he
- wishes to have an index on the OID field.
- This decision should be contrasted with most object-oriented systems
- which construct an OID index for all constructed types in the system automatically.
- The POSTGRES scheme allows the cost of the index to be
- paid only for those types of objects
- for which it is profitable. In our opinion, this flexibility has been
- an excellent decision.
- .pp
- Second, there are several possible ways to implement an inheritance
- hierarchy. Considering the SALESMEN and
- EMP example noted earlier, one can store
- instances of SALEMAN by storing them as EMP records and then
- only storing the extra quota information in
- a separate SALESMAN record. Alternately, one
- can store no information on each salesman in EMP and then store complete
- SALESMAN records elsewhere. Clearly, there are
- a variety of additional schemes.
- .pp
- POSTGRES chose one implementation, namely storing all SALESMAN fields
- in a single record. However, it is likely that
- applications designers will demand several
- other representations to give them the
- flexibility to optimize their particular
- data.
- Future implementations of inheritance will likely require several
- storage options.
- .sh 2 "A Critique of the POSTGRES Data Model"
- .pp
- There are five areas where we feel we made mistakes in
- the POSTGRES data model:
- .(l
- union types
- access method interface
- functions
- big objects
- arrays
- .)l
- We discuss each in turn.
- .pp
- A desirable feature in any next-generation DBMS would be to support
- union types, i.e. an instance of a type
- can be an
- instance of one of several given types.
- A persuasive example (similar to one from [COPE84]) is that employees
- can be on loan to another plant or on loan to a customer. If two base types, customer
- and plant exist, one would
- like to change the EMP type to:
- .(l
- create EMP (name = c12, dept = DEPT, salary = float, on-loan-to = plant or customer)
- .)l
- Unfortunately including union types makes a query optimizer more complex. For example, to
- find all the employees on loan to the same organization one would state the query:
- .(l
- retrieve (EMP.name, E.name)
- using E in EMP
- where EMP.on-loan-to = E.on-loan-to
- .)l
- However, the optimizer must construct two different plans, one for employees on loan to
- a customer and one for employees on loan to a different plant. The reason
- for two plans is that the equality operator may be different for the
- two types.
- In addition, one must construct indexes on union
- fields, which entails substantial complexity in the access methods.
- .pp
- Union types are
- .b highly
- desirable in certain applications, and
- we considered three possible
- stances with respect to union types:
- .(l
- 1) support only through abstract data types
- 2) support through POSTQUEL functions
- 3) full support
- .)l
- Union types can be easily constructed using the POSTGRES abstract
- data type facility. If a user wants a specific union type, he
- can construct it and then write appropriate operators and functions for
- the type. The implementation complexity of union types is thus forced into
- the routines for the operators and functions and onto
- the implementor of the type. Moreover,
- it is clear that there
- are a vast number of union types and
- an extensive type library must be constructed by the application designer.
- The PICASSO team
- stated that this approach placed an unacceptably difficult burden on them,
- and therefore position 1 was rejected.
- .pp
- Position
- 2 offers some support for union types but has problems. Consider the
- example of employees and their hobbies from [STON86]:
- .(l
- create EMP (name = c12, hobbies = POSTQUEL)
- .)l
- Here the hobbies field is a POSTQUEL function, one per employee, which retrieves
- all hobby information about that particular employee. Now consider the
- following POSTQUEL query:
- .(l
- retrieve (EMP.hobbies.average) where EMP.name = "Fred"
- .)l
- In this case the field average for each hobby record will
- be returned whenever it is defined. Suppose, however, that
- average is a float for the softball hobby and an integer for
- the cricket hobby. In this case, the application program
- must be prepared to accept values of different types.
- .pp
- The more
- difficult problem is the following legal POSTQUEL query:
- .(l
- retrieve into TEMP (result = EMP.hobbies.average) where EMP.name = "Fred"
- .)l
- In this case, a problem arises concerning the type of the result field, because
- it is a union type.
- Hence, adopting
- position 2 leaves one in an awkward position
- of not having a reasonable type for the result of the above
- query.
- .pp
- Of course, position 3 requires extending the indexing and query optimization
- routines to deal with union types.
- Our solution
- was to adopt position 2 and to add an abstract data type, ANY, which
- can hold an instance of
- any type. This solution which turns the type
- of the result of the above query from
- .(l
- one-of {integer, float}
- .)l
- into ANY is not very satisfying. Not only is information lost, but
- we are also forced to include with POSTGRES this universal type.
- .pp
- In our opinion, the only realistic alternative is to adopt position
- 3, swallow the complexity increase, and that is what we would do in any next system.
- .pp
- Another failure concerned the access method design and was
- the decision to support indexing only on the value of a field and
- not on a function of
- a value. The utility of indexes on functions
- of values is discussed in [LYNC88], and the capability was
- retrofitted, rather inelegantly, into one version of POSTGRES [AOKI89].
- .pp
- Another comment on the access method design concerns extendibility.
- Because
- a user can add new base types
- dynamically, it is essential that he also
- be able to add new access methods to POSTGRES
- if the system does not come with an access method
- that supports efficient access
- to his types. The standard example of this capability is the use
- of R-trees [GUTM84] to speed access to geometric objects.
- We have now designed and/or coded three access methods
- for POSTGRES in addition to B+-trees. Our experience has
- consistently been that
- adding an access method is
- .b VERY
- .b HARD.
- There are four
- problems that complicate the situation. First, the access method must
- include explicit calls to the POSTGRES locking subsystem to set and
- release locks on access method objects. Hence, the designer of a new
- access method must understand locking and how to use the particular
- POSTGRES facilities. Second, the designer must understand how to
- interface to the buffer manager and be able to get, put, pin and unpin
- pages. Next, the POSTGRES execution engine contains the ``state'' of the
- execution of any query and the access methods must understand
- portions of this state and the data structures involved.
- Last but not least, the
- designer must write 13 non-trivial routines. Our experience so far is that
- novice programmers can add new types to POSTGRES; however, it requires
- a highly skilled programmer to add a new access method. Put differently, the
- manual on how to add new data types to POSTGRES is 2 pages long, the one for
- access methods is 50 pages.
- .pp
- We failed to realize the difficulty of access method construction. Hence,
- we designed a system that allows end users to add access methods dynamically to
- a running system. However, access methods will be built by sophisticated
- system programmers who could have used a simpler to build interface.
- .pp
- A third area where our design is flawed
- concerns POSTGRES support for POSTQUEL functions.
- Currently,
- such functions in POSTGRES are collections of commands in the query
- language POSTQUEL. If one defined budget in DEPT as a
- POSTQUEL function, then the value for the shoe department budget might be the following
- command:
- .(l
- retrieve (DEPT.budget) where DEPT.dname = "candy"
- .)l
- In this case, the shoe department will automatically be
- assigned the same budget as the candy department. However,
- it is impossible for the budget of the shoe department to be specified as:
- .(l
- if floor = 1 then
- retrieve (DEPT.budget) where DEPT.dname = "candy"
- else
- retrieve (DEPT.budget) where DEPT.dname = "toy"
- .)l
- This specification defines the budget of the shoe department to be
- the candy department budget if it is on the first floor.
- Otherwise, it is the same as the toy department.
- This query is not possible because POSTQUEL has no
- conditional expressions. We had
- .b extensive
- discussions
- about this and other extensions to POSTQUEL. Each such extension was
- rejected because it seemed to turn POSTQUEL into a programming
- language and not a query language.
- .pp
- A better solution would be
- be to allow
- a POSTQUEL function to be expressible in a general purpose
- programming language enhanced with POSTQUEL queries. Hence,
- there would be no distinction between normal functions and POSTQUEL
- functions.
- Put differently, normal functions would be able to be constructed types
- and would support path expressions.
- .pp
- There are three problems with this approach.
- First, path expressions for normal functions
- cannot be optimized by the POSTGRES query
- optimizer because they have arbitrary semantics. Hence, most
- of the optimizations planned for POSTQUEL
- functions would have to be discarded.
- Second, POSTQUEL functions are much easier to define than normal functions
- because a user need not know a general purpose programming language. Also, he need
- not specify the types of the function arguments or the return type because POSTGRES
- can figure these out from the query specification.
- Hence, we would have to give up ease of definition in moving from
- POSTQUEL functions to normal functions.
- Lastly,, normal functions have a protection problem
- because they can do arbitrary things, such as zeroing
- the data base. POSTGRES
- deals with this problem
- by calling normal functions in two ways:
- .(l
- trusted -- loaded into the POSTGRES address space
- untrusted -- loaded into a separate address space
- .)l
- Hence, normal functions are either called quickly with no security or
- slowly in a protected fashion. No such security problem arises with
- POSTQUEL functions.
- .pp
- An better approach might have been to support POSTQUEL functions
- written in the 4th generation language (4GL)
- being designed for PICASSO [ROWE89]. This programming
- system leaves type information in the system catalogs. Consequently,
- there would be no need for a separate registrations step to
- indicate type information to POSTGRES. Moreover, a
- processor for the language is available for integration in POSTGRES.
- It is also easy to make a 4GL "safe",
- i.e. unable to perform wild branches or malicious actions
- Hence, there would be no security problem.
- Also, it seems possible that path expressions could be optimized for 4GL
- functions.
- .pp
- Current
- commercial relational products seem to be moving in this
- direction by allowing data base
- procedures to be coded in their proprietary 4th generation languages (4GLs).
- In retrospect we probably should have looked seriously at designing POSTGRES to support
- functions written in a 4GL.
- .pp
- Next, POSTGRES allows types to be constructed that are of arbitrary size.
- Hence,
- large bitmaps are a perfectly acceptable POSTGRES data type. However,
- the current POSTGRES user interface (portals) allows a user to fetch one or
- more instances of a constructed type. It is currently impossible to fetch
- only a portion of an instance. This presents
- an application program with a severe buffering problem; it must be capable
- of accepting an entire instance, no matter how large it is. We should
- extend the portal syntax in a straightforward way to allow
- an application to position a portal on a specific field of an instance of a constructed type
- and then
- specify a byte-count that he would like to retrieve. These changes would
- make it much easier to insert and retrieve big fields.
- .pp
- Lastly, we included arrays in the POSTGRES data model. Hence, one could
- have specified
- the SALESMAN type as:
- .(l
- create SALESMAN (name = c12, dept = DEPT, salary = float, quota = float[12])
- .)l
- Here, the SALESMAN has all the fields of
- EMP plus a quota which is an array of 12 floats, one for each month of the year.
- In fact, character strings are really an array of characters, and the correct
- notation for the above type is:
- .(l
- create SALESMAN (name = c[12], dept = DEPT, salary = float, quota = float[12])
- .)l
- In POSTGRES we support fixed and variable length arrays of base types, along
- with an array notation in POSTQUEL. For example to request all salesmen
- who have an April quota over 1000, one would write:
- .(l
- retrieve (SALESMAN.name) where SALESMAN.quota[4] > 1000
- .)l
- However, we do not support arrays of constructed types; hence it is not
- possible to have an array of instances of a constructed type.
- We omitted this capability only because
- it would have made the query optimizer and executor somewhat harder.
- In addition, there is no built-in search mechanism for the elements of an array.
- For example, it is not possible to find the names of all salesmen who have
- a quota over 1000 during any month of the year.
- In retrospect, we should included general support for arrays or no support at all.
- .sh 1 "THE RULES SYSTEM"
- .sh 2 "Introduction"
- .pp
- It is clear to us that all DBMSs need a rules system. Current commercial
- systems are required to support referential integrity [DATE81], which
- is merely a simple-minded collection of rules. In addition, most current
- systems have special purpose rules systems to support relational views,
- protection, and integrity constraints. Lastly, a rules system allows
- users to do event-driven programming as well as enforce integrity
- constraints that cannot be performed in other ways.
- There are three high level decisions that the POSTGRES
- team had to make concerning the philosophy of rule systems.
- .pp
- First, a decision was required concerning how many rule syntaxes
- there would be. Some approaches, e.g. [ESWA76, WIDO89] propose
- rule systems oriented toward application designers that would augment
- other rule systems present for DBMS internal purposes. Hence,
- such systems would contain several independently
- functioning rules systems. On the other hand, [STON82] proposed a rule
- system that tried to support user functionality as well as needed
- DBMS internal functions in a single syntax.
- .pp
- From the beginning, a goal of the POSTGRES rules system was to have only
- one syntax. It was felt that this would simplify the user interface, since
- application designers need learn only one construct. Also, they would not
- have to deal with deciding which system to use in the cases where
- a function could be performed by more than one rules system.
- It was also felt that a single rules system
- would ease the implementation
- difficulties that would be faced.
- .pp
- Second, there are two implementation philosophies by which one could
- support a rule system. The first is a
- .b query
- .b rewrite
- implementation.
- Here, a rule would be applied by
- converting a user query to an alternate form
- prior to execution. This transformation is performed between the
- query language parser and the optimizer. Support for views [STON75]
- is done this way along with many of the proposals for recursive query
- support [BANC86, ULLM85]. Such an implementation
- will be very efficient when there are a small number of rules on any
- given constructed type and most rules cover the whole constructed type. For example,
- a rule such as:
- .(l
- EMP [dept] contained-in DEPT[dname]
- .)l
- expresses the referential integrity condition that employees cannot be in a
- non-existent department and
- applies to all EMP instances.
- However, a query rewrite
- implementation will not work well if there are a large number of rules on
- each constructed type, each of them covering
- only a few instances. Consider, for example, the following three rules:
- .(l
- employees in the shoe department have a steel desk
- employees over 40 have a wood desk
- employees in the candy department do not have a desk
- .)l
- To retrieve the kind of a desk that Sam has, one must run the
- following three queries:
- .(l
- retrieve (desk = ``steel'') where EMP.name = ``Sam'' and EMP.dept = ``shoe''
- retrieve (desk = ``wood'') where EMP.name= ``Sam'' and EMP.age > 40
- retrieve (desk = null) where EMP.name = ``Sam'' and EMP.dept = ``candy''
- .)l
- Hence, a user query must be rewritten for each rule, resulting
- in a serious degradation of performance unless all
- queries are processed as a group using multiple query optimization
- techniques [SELL86].
- .pp
- Moreover,
- a query rewrite system has great difficulty with
- .b exceptions
- [BORG85].
- For example consider the rule ``all employees have a
- steel desk'' together with the exception ``Jones is an employee who has a
- wood desk''.
- If one ask for the kind of desk and age for all employees over 35,
- then the query must be
- rewritten as the following 2 queries:
- .(l
- retrieve (desk = "steel", EMP.age) where EMP.age > 35 and EMP.name != "Jones"
- retrieve (desk = "wood", EMP.age) where EMP.age > 35 and EMP.name = "Jones"
- .)l
- In general, the number of queries as well as the complexity of their
- qualifications increases linearly with the number
- of rules. Again, this will result in bad performance unless
- multiple query optimization techniques are applied.
- .pp
- Lastly, a query rewrite system
- does not offer any help in resolving situations when the rules are
- violated. For example, the above referential integrity rule is
- silent on what to
- do if a user tries to insert an employee into a non-existent department.
- .pp
- On the other hand, one could adopt a
- .b trigger
- implementation based on individual record accesses and updates
- to the data base. Whenever a record is accessed, inserted, deleted
- or modified, the low level execution code has both the old record and the
- new record readily available. Hence, assorted actions can
- easily be taken by the
- low level code.
- Such an implementation requires the rule firing code to be placed deep in
- the query execution routines. It
- will work well if there are many rules each affecting only a few instances,
- and it is easy to deal successfully with conflict resolution at this
- level.
- However, rule firing is deep in the executor, and it is thereby impossible for
- the query optimizer to construct an efficient execution plan for
- a chain of rules that are awakened.
- .pp
- Hence, this implementation complements a query rewrite scheme in
- that it excels where a rewrite scheme
- is weak and vica-versa.
- Since we wanted to have a single rule system,
- it was
- clear that we needed to provide both styles of implementation.
- .pp
- A third issue that we faced was the paradigm for the rules system. A
- conventional production system consisting of collections
- of if-then rules has been explored in the past [ESWA76, STON82],
- and is a readily available alternative. However, such a scheme lacks
- expressive power. For example, suppose one wants to enforce a rule
- that Joe makes the same salary as Fred. In this case, one must
- specify two different if-then rules. The first one indicates the
- action to take if Fred receives a raise, namely to propagate the
- change on to Joe. The second rule specifies that any update
- to Joe's salary must be refused. Hence, many user rules require two or more
- if-then specifications to achieve the desired effect.
- .pp
- The intent in POSTGRES
- was to explore a more powerful paradigm.
- Basically,
- any POSTGRES command can be turned into a rule by changing the semantics of the
- command so that it is logically either
- .b always
- running or
- .b never
- running. For example, Joe may be specified to have the same
- salary as Fred by the rule:
- .(l
- always replace EMP (salary = E.salary)
- using E in EMP
- where EMP.name = "Fred" and E.name = "Joe"
- .)l
- This single specification will propagate Joe's salary
- on to Fred as well as refuse direct updates to Fred's salary.
- In this way a single ``always'' rule
- replaces the two statements needed in a production
- rule syntax.
- .pp
- Moreover, to efficiently support the
- triggering implementation where there are a large number of rules present for
- a single constructed type, each of which applies to only a few
- instances, the POSTGRES team designed a sophisticated
- marking scheme whereby rule wake-up information is placed on individual
- instances. Consequently, regardless of the number of rules present for
- a single constructed type, only those which actually must fire will be awakened.
- This should be contrasted to proposals without such data structures,
- which will be hopelessly inefficient whenever a large
- number of rules are present for a single constructed type.
- .pp
- Lastly, the decision was made to support the query rewrite scheme by
- escalating markers to the constructed type level.
- For example, consider the
- rule:
- .(l
- always replace EMP (age = 40) where name != "Bill"
- .)l
- This rule applies to all employees except Bill and it would be
- a waste of space to
- mark each individual employee. Rather, one would prefer to set
- a single marker in the system catalogs to
- cover the whole constructed type implicitly.
- In this case, any query, e.g:
- .(l
- retrieve (EMP.age) where EMP.name = "Sam"
- .)l
- will be altered prior to execution by the query rewrite implementation
- to:
- .(l
- retrieve (age = 40) where EMP.name = "Sam" and EMP.name != "Bill"
- .)l
- .pp
- At the current time much of the POSTGRES Rules System (PRS)
- as described in [STON88]
- is operational, and there are three aspects of the design
- which we wish to
- discuss
- in the next three subsections, namely:
- .(l
- complexity
- absence of needed function and
- efficiency
- .)l
- Then, we close with the second version of the POSTGRES Rules
- system (PRS II) which we are currently designing. This rules system
- is described in
- more detail in [STON89, STON89b].
- .sh 2 "Complexity"
- .pp
- The first problem with PRS is that the implementation
- is
- exceedingly complex. It is difficult to explain the marking mechanisms that
- cause rule wake-up even to a sophisticated person. Moreover, some of
- us have an uneasy feeling that the implementation may not be
- quite correct. The fundamental problem can be illustrated using the
- Joe-Fred example above.
- First, the rule must be awakened and
- run whenever Fred's salary changes. This requires that one kind of marker
- be placed on the salary of Fred. However, if Fred is
- given a new
- name, say Bill, then the rule must be deleted and reinstalled. This
- requires a second kind of marker on the name of Fred. Additionally,
- it is inappropriate to allow any update to Joe's salary; hence a third
- kind of marker is required on that field. Furthermore, if Fred
- has not yet been hired, then the rule must take effect on the
- insertion of his record. This requires a marker to be placed in the
- index for employee names. To support rules that deal
- with ranges of values, for example:
- .(l
- always replace EMP (age = 40)
- where EMP.salary > 50000 and EMP.salary < 60000
- .)l
- we require that two ``stub'' markers be placed in the index to denote
- the ends of the scan. In addition, each intervening index record must
- also be marked.
- Ensuring that all markers are correctly installed and
- appropriate actions taken when record accesses and updates occur
- has been a challenge.
- .pp
- Another source of substantial complexity is the necessity to
- deal with priorities.
- For example, consider a
- second rule:
- .(l
- always replace EMP (age = 50) where EMP.dept = "shoe"
- .)l
- In this case a highly paid shoe department employee would be
- given two different ages. To alleviate this situation, the second
- rule could be given a higher priority, e.g:
- .(l
- always replace EMP (age = 50) where EMP.dept = "shoe"
- priority = 1
- .)l
- The default priority for rules is 0; hence the first rule would set
- the age of highly paid employees to 40 unless they were in the
- shoe department, in which case their age would be set to 50 by the second
- rule. Priorities, of course, add complications to the rules system.
- For example, if the second rule above is deleted, then the first rule
- must be awakened to correct the ages of employees in the shoe
- department.
- .pp
- Another aspect of complexity is our decision to support both early
- and late evaluation of rules. Consider the example rule that
- Joe makes the same salary as Fred. This rule can be awakened when
- Fred gets a salary adjustment, or activation can be delayed until
- a user requests the salary of Joe. Activation can be
- delayed as long as possible in the second case, and we term this
- .b late
- evaluation while the former case is termed
- .b early
- evaluation. This flexibility also results in substantial
- extra complexity. For example, certain rules cannot be activated
- late. If salaries of employees are indexed, then the
- rule that sets Joe's salary to that of Fred must be activated early
- because the index must be kept correct. Moreover, it is impossible
- for an early rule to read data that is written by a late rule. Hence,
- additional restrictions must be imposed.
- .pp
- Getting PRS correct has entailed
- uncounted hours of discussion and considerable implementation
- complexity.
- The bottom line is that the implementation of a rule system that is
- clean and simple to the user is, in fact, extremely complex and
- tricky. Our personal feeling is that we should have embarked on a
- more modest rules system.
- .sh 2 "Absence of Needed Function"
- .pp
- The definition of a
- .b useful
- rules system is one that can handle
- at least all of the following problems in one integrated system:
- .(l
- support for views
- protection
- referential integrity
- other integrity constraints
- .)l
- We focus in this section on support
- for views. The query rewrite implementation
- of a rules system
- should be able to translate queries on views into queries on real objects.
- In addition, updates to views should be similarly mapped to updates on
- real objects.
- .pp
- There are various special cases of view support that can be
- performed by PRS, for example materialized views.
- Consider the following view definition:
- .(l
- define view SHOE-EMP (name = EMP.name, age = EMP.age, salary = EMP.salary)
- where EMP.dept = ``shoe''
- .)l
- The following two PRS rules specify a materialization of this view:
- .(l
- always append to SHOE-EMP (name = EMP.name, salary = EMP.salary) where EMP.dept = ``shoe''
- always delete SHOE-EMP where SHOE-EMP.name not-in {EMP.name where EMP.dept = ``shoe''}
- .)l
- In this case, SHOE-EMP will always contain a correct materialization
- of the shoe department employees,
- and queries can be directed to this materialization.
- .pp
- However, there seemed to be no way to support updates on views that are not
- materialized. One of us has spent countless hours attempting to
- support this function through PRS and failed. Hence,
- inability to support operations conventional views is a major weakness of PRS.
- .sh 2 "Implementation Efficiency"
- .pp
- The current POSTGRES implementation uses markers on individual fields
- to support rule activation. The only escalation supported
- is to convert a collection of field level markers to a single
- marker on the entire constructed type. Consequently, if a rule
- covers a single instance, e.g:
- .(l
- always replace EMP (salary = 1000) where EMP.name = "Sam"
- .)l
- then a total of 3 markers will be set, one in the index, one
- on the salary field and one on the name field. Each marker is
- composed of:
- .(l
- rule-id -- 6 bytes
- priority -- 1 byte
- marker-type -- 1 byte
- .)l
- Consequently, the marker overhead for the rule is 24 bytes,
- Now consider a more complex rule:
- .(l
- always replace EMP (salary = 1000) where EMP.dept = "shoe"
- .)l
- If 1000 employees work in the shoe department, then 24,000 bytes of
- overhead will be consumed in markers. The only other option is
- to escalate to a marker on the
- entire constructed type, in which case the rule will be
- activated if any salary is read or written
- and not just for employees in the shoe department. This will be
- an overhead intensive option. Hence, for rules which cover
- many instances but not a significant fraction of all instances, the POSTGRES
- implementation will not be very space efficient.
- .pp
- We are considering several solutions to this problem. First, we have
- generalized
- B+-trees to efficiently store interval data as well as point data. Such
- ``segmented B+-trees'' are the subject of a separate paper [KOLE89]. This
- will remove the space overhead in the index for the dominant form of
- access method. Second, to lower the overhead on data records, we will
- probably implement markers at the physical block level as well as at the
- instance and constructed type levels. The appropriate extra granularities are
- currently under investigation.
- .sh 2 "The Second POSTGRES Rules System"
- .pp
- Because of the inability of the current rules paradigm to support
- views and to a lesser extent the fundamental complexity
- of the implementation, we are converting to a second POSTGRES
- rules system (PRS II).
- This rules system has much in common with the first
- implementation, but
- returns to the traditional production rule
- paradigm to obtain
- sufficient control to perform view updates correctly.
- This section
- outlines our thinking, and a complete proposal appears in [STON89b].
- .pp
- The production rule syntax we are using in PRS II
- has the form:
- .(l
- ON event TO object WHERE POSTQUEL-qualification
- THEN DO POSTQUEL-command(s)
- .)l
- Here, event is
- retrieve,
- replace,
- delete,
- append,
- new (i. e. replace or append) or
- old (i.e. delete or replace).
- Moreover, object is either
- the name of a constructed type or
- constructed-type.column.
- POSTQUEL-qualification is a normal qualification, with no additions or
- changes. Lastly, POSTQUEL-commands is a set of POSTQUEL commands
- with the following two changes:
- .(l
- NEW, OLD or CURRENT can appear instead of the name of a constructed type in front of
- any attribute.
-
- refuse (target-list) is added as a new POSTQUEL command
- .)l
- In this notation we would specify the "Fred-Joe" rule as:
- .(l
- on NEW EMP.salary where EMP.name = "Fred"
- then do
- replace E (salary = CURRENT.salary)
- using E in EMP
- where E.name = "Joe"
-
- on NEW EMP.salary where EMP.name = "Joe"
- then do
- refuse
- .)l
- Notice, that PRS II
- is less powerful than the "always" system because
- the Fred-Joe rule require two specifications instead of one.
- .pp
- PRS II has both a query rewrite implementation and a trigger implementation, and it is an optimization
- decision which one to use as noted in [STON89b].
- For example, consider the rule:
- .(l
- on retrieve to SHOE-EMP
- then do
- retrieve (EMP.name, EMP.age, EMP.salary) where EMP.dept = "shoe"
- .)l
- Any
- query utilizing such a rule, e.g:
- .(l
- retrieve (SHOE-EMP.name) where SHOE-EMP.age < 40
- .)l
- would be processed by the
- rewrite implementation to:
- .(l
- retrieve (EMP.name) where EMP.age < 40 and EMP.dept = ``shoe''
- .)l
- As can be seen, this is identical to the
- query modification performed in relational view processing techniques [STON75].
- This rule could also be processed by the triggering system, in which case the rule would
- materialize the records in SHOE-EMP iteratively.
- .pp
- Moreover, it is straightforward to support
- additional functionality, such as allowing multiple queries in the
- definition of a view.
- Supporting materialized views can
- be efficiently done by
- .b caching
- the action part of the above rule, i.e. executing
- the command before a user requests evaluation.
- This corresponds to moving the rule to early evaluation.
- Lastly, supporting views that are partly materialized and partly specified as
- procedures as well as views that involve recursion appears fairly simple.
- In [STON89b] we present details on these extensions.
- .pp
- Consider the following collection of rules that support
- updates to SHOE-EMP:
- .(l
- on NEW SHOE-EMP
- then do
- append to EMP (name = NEW.name, salary = NEW.salary)
-
- on OLD SHOE-EMP
- then do
- delete EMP where EMP.name = OLD.name and EMP.salary = OLD.salary
-
- on UPDATE to SHOE-EMP
- then do
- replace EMP (name = NEW.name, salary = NEW.salary)
- where EMP.name = NEW.name
- .)l
- If these rules are processed by the trigger implementation, then an update to SHOE-EMP, e.g:
- .(l
- replace SHOE-EMP (salary = 1000) where SHOE-EMP.name = ``Mike''
- .)l
- will be processed
- normally until it generates a collection of
- .(l
- [new-record, old-record]
- .)l
- pairs. At this point the triggering system can be activated to make
- appropriate updates to underlying constructed types.
- Moreover, if a user wishes non-standard view update semantics, he can
- perform any particular actions he desires
- by changing the action part of the above rules.
- .pp
- PRS II thereby allows a user to use the rules system
- to define semantics for retrievals and updates to views. In fact, we expect to build a
- compiler that will convert a higher level view notration into the needed collection of PRS II rules.
- In addition, PRS II retains all functionality of the first rules system, so protection, alerters
- integrity constraints, and arbitrary triggers are readily expressed.
- The only disadvantage is that PRS II requires two rules to perform
- many tasks expressible as a single PRS rule. To overcome this disadvantage, we
- will likely continue to support the PRS syntax in addition to the PRS II syntax
- and compile PRS into PRS II.
- support
- .pp
- PRS II can be supported by the same implementation that
- we proposed for the query rewrite implementation of PRS,
- namely marking instances in the system catalogs. Moreover, the
- query rewrite
- algorithm is nearly the same as in the first implementation.
- The triggering system can be supported by the same instance markers as in PRS.
- In fact, the implementation is bit simpler because
- a couple of the types of markers are not required.
- Because the implementation of PRS II is so similar to our initial
- rules system, we expect to
- have the conversion completed in the near future.
- .sh 1 "STORAGE SYSTEM"
- .sh 2 "Introduction"
- .pp
- When considering the POSTGRES storage system, we were guided by a missionary
- zeal to do something different. All current commercial systems use
- a storage manager with a write-ahead log (WAL), and we felt that this
- technology was well understood. Moreover, the original INGRES
- prototype from the 1970s used a similar storage manager, and we had no
- desire to do another implementation.
- .pp
- Hence, we seized on the idea of implementing a
- ``no-overwrite'' storage manager. Using this technique
- the old record remains in the data base whenever an update occurs, and
- serves the purpose
- normally performed by a write-ahead log. Consequently, POSTGRES has
- no log in the conventional sense of the term. Instead the
- POSTGRES log is simply 2 bits per transaction indicating whether each
- transaction committed, aborted, or is in progress.
- .pp
- Two very nice features can be exploited in a no-overwrite system. First,
- aborting a transaction can be instantaneous because one does not need
- to process the
- log undoing the effects of updates; the previous records are readily
- available in the data base. More generally, to recover from a crash,
- one must abort all the transactions in progress at the time of the crash.
- This process can be effectively instantaneous in POSTGRES.
- .pp
- The
- second benefit
- of a no-overwrite storage manager is the possibility of
- .b time
- .b travel.
- As noted earlier, a user can ask a historical query and POSTGRES will
- automatically return information from the record valid at the correct time.
- .pp
- This storage manager should be contrasted with a conventional
- one where the previous record is overwritten with a new one.
- In
- this case a write-ahead log is required to maintain the previous version
- of each record. There is no possibility of time travel because the
- log cannot be queried since it is in a
- different format. Moreover, the data base
- must be restored to a consistent state when a crash occurs by
- processing the log to undo any partially completed transactions.
- Hence, there is no possibility of instantaneous crash recovery.
- .pp
- Clearly a no-overwrite storage manager is superior to a conventional
- one if it can be implemented at comparable performance. There is
- a brief hand-wave of an argument in [STON87] that alleges this
- might be the case. In our opinion, the argument hinges around the
- existence of
- .b stable
- main memory. In the absence of stable memory, a no-overwrite
- storage manager must force to disk at commit time all
- pages written by a transaction.
- This is required because the effects of a committed
- transaction must be durable in case a crash occurs and
- main memory is lost.
- A conventional
- data manager on the other hand, need only force to disk at commit time
- the log pages for the
- transaction's updates.
- Even if there are as
- many log pages as data pages (a highly
- unlikely occurence), the conventional storage manager
- is doing sequential I/O to the log while a no-overwrite
- storage manager is doing random I/O. Since sequential I/O is substantially
- faster than random I/O, the no-overwrite solution is guaranteed to offer
- worse performance.
- .pp
- However, if stable main memory is present then neither solution must
- force pages to disk. In this environment, performance should be comparable.
- Hence,
- with stable main memory it appears that a no-overwrite solution is competitive.
- As computer manufacturers offer some form of stable
- main memory, a no-overwrite
- solution may become a viable storage option.
- .pp
- In designing the POSTGRES storage system, we were guided by two
- philosophical premises.
- First, we decided to make a clear distinction between
- current data and historical data.
- We expected
- access patterns to be highly skewed toward
- current records. In addition, queries to the archive might look very
- different from those accessing current data. For both reasons, POSTGRES
- maintains two different physical
- collections of records, one for the current data
- and one for historical data, each with its own indexes.
- .pp
- Second, our design assumes the existence
- of a randomly addressable archive device on which historical records
- are placed.
- Our intuitive model for this archive is an optical disk.
- Our design was purposely made
- consistent with an archive that has
- a write-once-read-many (WORM) orientation. This
- characterizes many of the
- optical disks on the market today.
- .pp
- In the next subsection we indicate two problems with the POSTGRES design.
- Then, in Section 5.3 we make additional comments on the storage manager.
- .sh 2 "Problems in the POSTGRES Design"
- .pp
- There are at least two problems with our design. First, it is unstable
- under heavy load. An asynchronous demon, known as
- vacuum cleaner, is responsible for moving historical records from
- the magnetic disk structure holding the current records to
- the archive where historical records remain.
- Under normal circumstances,
- the magnetic
- disk portion of each constructed type is (say) only 1.1 times
- the minimum possible size
- of the constructed type. Of course, the vacuum cleaner consumes CPU and I/O
- resources running in background achieving this goal. However, if
- the load on a POSTGRES data base
- increases, then the vacuum cleaner may not get to
- run. In this case the magnetic
- disk portion of a constructed type will increase,
- and performance will suffer because the execution engine must read
- historical records on the magnetic disk during the (presumably frequent)
- processing
- of queries to the current data base. As a result,
- performance will degrade proportionally to the excess size of
- the magnetic disk portion of the data base. As load increases, the
- vacuum cleaner gets less resources, and performance
- degrades as the size of the magnetic disk data base increases. This
- will ultimately result in a POSTGRES data base going into meltdown.
- .pp
- Obviously,
- the vacuum cleaner should be run in background if possible so that
- it can consume resources at 2:00 A.M. when there is little other activity.
- However, if there is consistent heavy load on a system, then
- the vacuum cleaner must
- be scheduled at the same priority as other tasks, so the above instability
- does not occur. The bottom line is that scheduling the vacuum cleaner is
- a tricky optimization problem.
- .pp
- The second comment which we wish to make is that future archive systems are
- likely to be read/write, and rewritable optical disks have already
- appeared on the
- market. Consequently, there is no reason for us to have
- restricted ourselves to
- WORM technology.
- Certain POSTGRES assumptions were therefore unnecessary, such as
- requiring the current portion
- of any constructed type to be on magnetic disk.
- .sh 2 "Other Comments"
- .pp
- Historical indexes will usually be on
- a combined key consisting of a time range together with one or
- more keys from the record itself. Such two-dimensional
- indexes can be stored using the technology of R-trees [GUTM84],
- R+-trees [FALO87] or perhaps in some
- new way. We are not particularly comfortable that good ways to
- index time ranges have been found, and we encourage additional
- work in this area.
- A possible approach is segmented R-trees which we are studying [KOLE89].
- .pp
- Another comment concerns POSTGRES support for time travel. There are many
- tasks that are very difficult to express with our mechanisms. For example, the query to
- find the time at which Sam's salary increased from $5000 to $6000
- is very tricky in POSTQUEL.
- .pp
- A last comment is that time travel can be implemented with a
- conventional transaction system using a write ahead log.
- For example,
- one need only have an ``archive'' constructed type for each physical
- constructed type for which
- time travel is desired. When a record is updated, its previous value is written
- in the archive with the appropriate timestamps. If the transaction fails to
- commit, this archive insert and the corresponding
- record update is unwound using a conventional
- log. Such an implementation may well have substantial
- benefits, and we should have probably considered such a
- possibility.
- In making storage system decisions we were guided by
- a missionary zeal
- to do something different than a conventional write ahead log
- scheme.
- Hence, we may have overlooked other intriguing options.
- .sh 1 "THE POSTGRES IMPLEMENTATION"
- .sh 2 "Introduction"
- .pp
- POSTGRES contains a fairly conventional parser, query optimizer and
- execution engine. Two aspects of the implementation deserve special
- mention,
- .(l
- dynamic loading and the process structure
- choice of implementation language
- .)l
- and we discuss each in turn.
- .sh 2 "Dynamic Loading and Process Structure"
- .pp
- POSTGRES assumes that data types, operators
- and functions can be added and subtracted dynamically, i.e. while
- the system is executing. Moreover, we have designed the system
- so that it can accommodate a potentially very large number of
- types and operators.
- Consequently, the user functions that support the implementation of a type must
- be dynamically loaded and unloaded. Hence, POSTGRES maintains a
- cache of currently loaded functions and dynamically moves functions
- into the cache and then ages them out of the cache. Moreover,
- the parser and optimizer run off of a main memory cache of information
- about types and operators. Again this cache must be maintained
- by POSTGRES software. It would have been much easier to assume that
- all types and operators were linked into the system at POSTGRES
- initialization time and have required a user to reinstall
- POSTGRES when he wished to add or drop types.
- Moreover, users of prototype software are not running systems which cannot
- go down for rebooting. Hence, the function is not essential.
- .pp
- Second, the rules system forces significant complexity on the design.
- A user can add a rule such as:
- .(l
- always retrieve (EMP.salary)
- where EMP.name = "Joe"
- .)l
- In this case his application process wishes to be notified of any
- salary adjustment to Joe. Consider a second user who gives
- Joe a raise. The POSTGRES process that actually does the adjustment
- will notice that a marker has been placed on the salary field. However,
- in order to alert the first user, one of four things must happen:
-
- a) POSTGRES could be designed as a single server process. In this case
- within the current process the first user's query could simply be activated.
- However, such a design is incompatible with running on a shared memory
- multiprocessor, where a so-called multi-server is required. Hence, this
- design was discarded.
-
- b) The POSTGRES process for the second user could run the first user's
- query and then connect to his application process to deliver results.
- This requires that an application process be coded to expect communication
- from random other processes. We felt this was too difficult to
- be a reasonable solution.
-
- c) The POSTGRES process for the second user could connect to the input
- socket for the first user's POSTGRES and deliver the query to be run.
- The first POSTGRES would run the query and then send results to the
- user. This would require careful synchronization of the input socket
- among multiple independent command streams. Moreover, it would
- require the second POSTGRES to know the portal name on which the first
- user's rule was running.
-
- d) The POSTGRES process for the second user could alert a special
- process called the
- .b POSTMASTER.
- This process would in turn alert the process for the first user where the query
- would be run and the results delivered to the application process.
-
- We have adopted the fourth design as the only one we thought was practical.
- However, we have thereby constructed a process through which everybody
- must channel communications. If the POSTMASTER crashes, then
- the whole POSTGRES environment must be restarted. This is a handicap, but
- we could think of no better solution. Moreover, there are a collection
- of system demons, including the vacuum cleaner mentioned above, which need
- a place to run. In POSTGRES they are run as subprocesses managed by
- the POSTMASTER.
- .pp
- A last aspect of our design concerns the operating system
- process structure.
- Currently, POSTGRES runs as one
- process for each active user. This was done as an expedient to
- get a system operational as quickly as possible. We plan on
- converting POSTGRES to use lightweight processes available in
- the operating systems we are using. These include PRESTO
- for the Sequent Symmetry and threads in Version 4 of Sun/OS.
- .sh 2 "Programming Language Used"
- .pp
- At the beginning of the project, we were forced to make
- a commitment to a programming language and machine environment.
- The machine was an easy one, since
- SUN workstations were nearly omnipresent at Berkeley, and any other
- choice would have been non-standard. However, we were free to
- choose any language in which to program. We considered the following:
- .(l
- C
- C++
- MODULA 2+
- LISP
- ADA
- SMALLTALK
- .)l
- We dismissed SMALLTALK quickly because we felt it was too
- slow and compilers were not readily available for a wide
- variety of platforms. We felt it desirable to keep open the
- option of distributing our software widely.
- We felt ADA and MODULA 2+ offered limited advantages over C++ and
- were not widely used in the Berkeley environment. Hence, obtaining
- pretrained programmers would have been a problem. Lastly, we
- were not thrilled to use C, since INGRES had been coded in C and we were anxious
- to choose a different language, if only for the sake of doing something
- different. At the time we started (10/85), there was not a stable C++
- compiler, so we did not seriously consider this option.
- .pp
- By a process of elimination, we decided to try writing POSTGRES
- in LISP. We expected that it would be especially
- easy to write the optimizer and inference engine in LISP, since
- both are mostly tree processing modules. Moreover, we
- were
- seduced by AI claims of high programmer productivity for applications
- written in LISP.
- .pp
- We soon realized that parts of the system were more easily coded in C,
- for example the buffer manager which
- moves 8K pages back and forth
- to the disk and uses a modified LRU algorithm to control what pages
- are resident.
- Hence, we adopted the
- policy that we would use both C and LISP and code
- modules of POSTGRES in whichever language was most appropriate.
- By the time Version 1 was operational, it contained
- about 17000 lines of LISP and about 63000
- lines of C.
- .pp
- Our feeling is that the use of LISP has been a terrible mistake
- for several reasons.
- First, current LISP environments
- are very large. To run a ``nothing'' program in LISP requires
- about 3 mbytes of address space. Hence, POSTGRES exceeds 4 mbytes in
- size, all but 1 mbyte is the LISP compiler, editor and assorted other
- non required (or even desired) functions. Hence, we suffer from a
- gigantic footprint. Second, a DBMS never wants to stop when
- garbage collection happens. Any response time sensitive
- program must therefore allocate and deallocate space manually, so that
- garbage collection never happens during normal processing. Consequently, we
- spent extra effort ensuring that LISP garbage collection is not
- used by POSTGRES. Hence, this aspect of LISP, which improves
- programmer productivity, was not available to us.
- Third, LISP execution is slow. As noted in the performance figures
- in the next section our
- LISP code is more than twice as slow as the comparable C code.
- Of course, it is possible that we are not skilled LISP programmers
- or do not know how to optimize the language; hence our experience should be
- suitably discounted.
- .pp
- However, none of these irritants
- was the real disaster. We have found that debugging a two language system is
- extremely difficult.
- The C debugger, of course, knows nothing about LISP while the LISP
- debugger knows nothing about C. As a result, we have found debugging
- POSTGRES to be a painful and frustrating task.
- Memory allocation bugs were among the most painful since LISP and C have
- very different models of dynamic memory.
- Of course, it is true that the
- optimizer and inference engine were easier to code in LISP. Hence,
- we saved some time there. However, this was more than compensated
- by the requirement of writing a lot of utility code that would convert
- LISP data structures into C and vica versa. In fact, our assessment
- is that the primary productivity increases in LISP come from the nice
- programming environment (e.g. interactive debugger, nice workstation
- tools, etc.) and not from the language itself. Hence, we would encourage
- the implementors of other programming languages
- to study the LISP environment carefully and
- implement the better ideas.
- .pp
- As a result we have just finished moving our 17000 lines of
- LISP to C to
- avoid the debugging hassle and secondarily to avoid the performance
- and footprint problems in LISP. Our experience with LISP and two
- language systems has
- not been positive, and we would caution others not to follow in our
- footsteps.
- .sh 1 "STATUS AND PERFORMANCE
- .pp
- At the current time (October 1989) the LISP-less Version 1
- of POSTGRES has been in
- the hands of users for a short time, and we are shaking the last bugs
- out of the C port. In addition, we have designed all of
- the additional functionality to appear in Version 2.
- The characteristics of Version 1 are:
-
- a) The query language POSTQUEL runs except for aggregates, functions and
- set operators.
-
- b) All object management capabilities are operational
- except POSTQUEL types.
-
- c) Some support for rules exists. Specifically,
- replace always commands are operational; however the implementation
- currently only supports early evaluation and only with
- markers on whole columns.
-
- d) The storage system is complete. However, we are taking delivery
- shortly on an optical disk jukebox, and so the archive is
- currently not implemented on a real optical disk. Moreover,
- R-trees to support time travel are not yet implemented.
-
- e) Transaction management runs.
-
- .pp
- The focus has been on getting the function in POSTGRES to
- run. So far, only minimal attention has been paid to
- performance.
- Figure 1 shows assorted queries in the
- Wisconsin benchmark and gives results for three systems
- running on
- a SUN 3/280. All numbers are run on a non-quiescent system
- so there may be significant fluctuations.
- The first two are the C and LISP versions of
- POSTGRES. These are functionally identical systems with the same
- algorithms embodied in the code. The footprint of the
- LISP system is about 4.5 Mbytes
- while the C system is about 1 Mbyte. For comparison purposes
- we also include the performance numbers for the commercial version of
- INGRES in the third column.
- .(z
- .hl
- .(c
- .TS
- c c c c
- l l l l.
- POSTGRES POSTGRES INGRES
- C-based LISP-based RTI 5.0
-
- nullqry 0.4 0.3 0.2
-
- scan 10Ktups 36. 180. 5.2
-
- retrieve into query
- 1% selectivity 38. n/a 9.9
-
- append to 10Ktup 4.7 180. 0.4
-
- delete from 10Ktup 37. n/a 5.7
-
- replace in 10Ktup 42. 280. 5.7
-
- .TE
- .)c
- .sp
- .ce
- A Comparison of INGRES and POSTGRES
- .ce
- (Times are listed in seconds per query.)
- .ce
- Figure 1
- .hl
- .)z
- As can be seen, the LISP system is several times slower than the C system.
- In various other benchmarks we have never seen the C system less than twice
- as fast as the LISP system. Moreover, the C system is several
- times slower than a
- commercial system. The Public domain version of INGRES that we worked on in the
- mid 1970's is about a factor of two slower than commercial INGRES. Hence, it
- appears that POSTGRES s about one-half the speed of the original INGRES.
- There are substantial inefficiencies in POSTGRES, especially in the
- code which checks that a retrieved record is valid.
- We expect that subsequent tuning
- will get us somewhere in between the performance of
- Public domain INGRES and RTI INGRES.
- .sh 1 "CONCLUSIONS"
- .pp
- In this section we summarize our opinions about certain aspects of
- the design of POSTGRES. First, we are uneasy about the complexity of
- the POSTGRES data model.
- The comments in Section 2 all contain suggestions to make it more complex.
- Moreover, other research teams have tended to construct even more
- complex data models, e.g. EXTRA [CARE88]. Consequently, a simple concept
- such as referential integrity, which can be done in only one way in
- existing commercial systems, can be done in several different ways in
- POSTGRES. For example, the user can implement an abstract data type and
- then do the required checking in the input conversion routine. Alternately,
- he can use a rule in the POSTGRES rules system. Lastly, he can use a POSTQUEL function
- for the field that corresponds to the foreign key in a current relational
- system. There are complex performance tradeoffs between these three
- solutions, and a decision must be made by a sophisticated application
- designer. We fear that real users, who have a hard time with data base design
- for existing relational systems, will find the next-generation
- data models, such as the one in POSTGRES, impossibly complex.
- The problem is that applications exist where each representation
- is the only acceptable one. The demand for wider application
- of data base technology ensures that vendors will produce
- systems with these more complex data models.
- .pp
- Another source of uneasiness is the fact that rules and POSTQUEL
- functions have substantial overlap in function. For example,
- a POSTQUEL function can be simulated
- by one rule per record, albeit at some performance penalty. On the
- other hand, all rules, except retrieve always commands, can be alternately
- implemented using POSTQUEL functions. We expect to merge the two concepts in
- Version 2, and our proposal appears in [STON89b].
- .pp
- In the areas of rules and storage management, we are basically satisfied
- with POSTGRES capabilities. The syntax of the rule system should be
- changed as noted in Section 3; however this is not a significant
- issue and it should be available easily in Version 2. The storage manager
- has been quite simple to implement. Crash recovery code has been
- easy to write because the only routine which must be carefully written is
- the vacuum cleaner. Moreover, access
- to past history seems to be a highly desirable
- capability.
- .pp
- Furthermore, the POSTGRES implementation certainly erred in the direction of
- excessive sophistication. For example, new types and functions
- can be added on the fly without recompiling POSTGRES. It would have
- been much simpler to construct a system that required recompilation
- to add a new type. Second, we have implemented a complete
- transaction system in Version 1. Other prototypes tend to assume a single
- user environment.
- In these and many other ways, we strove for
- substantial generality; however the net effect has been to slow down the
- implementation effort and make the POSTGRES internals much more
- complex. As a result, POSTGRES has taken us considerably longer to build
- than the original version of INGRES.
- One could call this the ``second system'' effect. It was essential that
- POSTGRES be more usable than the original INGRES prototype in
- order for us to feel like we were making a contribution.
- .pp
- A last comment concerns technology transfer to commercial systems. It
- appears that the process is substantially accelerating. For example,
- the relational model was constructed in 1970, first prototypes of
- implementations appeared around 1976-77, commercial versions first
- surfaced around 1981 and popularity of relational systems in the
- marketplace occurred around 1985. Hence, there was a 15
- year period during which the ideas were transferred to commercial systems.
- Most of the ideas in POSTGRES and in other next generation systems
- date from 1984 or later. Commercial systems embodying some of these
- ideas have already appeared and major vendors are expected to have
- advanced systems within the next year or two. Hence, the 15 year period
- appears to have shrunk to less than half that amount. This acceleration
- is impressive, but it will lead to rather short lifetimes for the
- current collection of prototypes.
-
- .vs 12
- \fBREFERENCES\fP
- .nr ii 10m
- .ip [AGRA89]
- Agrawal, R. and Gehani, N., "ODE: The Language and the Data Model," Proc. 1989 ACM-SIGMOD
- Conference on Management of Data, Portland, Or., May 1989.
- .ip [ATKI89]
- Atkinson, M. et. al., ``The Object-oriented Database System Manifesto,'' Altair Technical Report 30-89,
- Rocquencourt, France, August 1989.
- .ip [ANON85]
- Anon et. al.,
- .q "A Measure of Transaction Processing Power,"
- Tandem Computers, Cupertino, CA, Technical
- Report 85.1, 1985.
- .ip [AOKI89]
- Aoki, P., ``Implementation of Extended Indexes in POSTGRES,'' Electronics
- Research Laboratory, University of California, Technical Report 89-62,
- July 1989.
- .ip [BANC86]
- Bancilhon, F. and Ramakrishnan, R., ``An Amateur's Introduction to Recursive
- Query Processing,'' Proc. 1986 ACM-SIGMOD Conference on Management of
- Data, Washington, D.C., May 1986.
- .ip [BANE87]
- Banerjee, J. et. al., ``Semantics and Implementation of Schema Evolution
- in Object-oriented Databases,'' Proc. 1987 ACM SIGMOD Conference on
- Management of Data, San Francisco, Ca., May 1987.
- .ip [BITT83]
- Bitton, D. et. al.,
- ``Benchmarking Database Systems: A Systematic Approach,'' Proc. 1983 VLDB
- Conference, Cannes, France, Sept. 1983.
- .ip [BORG85]
- Borgida, A., \`\`Language Features for Flexible Handling of Exceptions
- in Information Systems,\'\' ACM-TODS, Dec. 1985.
- .ip [CARE88]
- Carey, M. et. al., ``A Data Model and Query Language for EXODUS,''
- Proc. 1988 ACM-SIGMOD Conference on Management of Data, Chicago, Ill.,
- June 1988.
- .ip [COPE84]
- Copeland, G. and D. Maier, ``Making Smalltalk a Database System,''
- Proc. 1984 ACM-SIGMOD Conference on Management of Data, Boston, Mass.
- June 1984.
- .ip [DADA86]
- Dadam, P. et. al., ``A DBMS Prototype to Support NF2 Relations,'' Proc.
- 1986 ACM-SIGMOD Conference on Management of Data, Washington, D.C., May 1986.
- .ip [DATE81]
- Date, C., \`\`Referential Integrity,\'\' Proc. Seventh International VLDB
- Conference, Cannes, France, Sept. 1981.
- .ip [ESWA76]
- Eswaren, K., ``Specification, Implementation and Interactions of a
- Rule Subsystem in an Integrated Database System,'' IBM Research,
- San Jose, Ca., Research Report RJ1820, August 1976.
- .ip [FALO87]
- Faloutsos, C. et. al., ``Analysis of Object Oriented Spatial Access Methods,''
- Proc. 1987 ACM-SIGMOD Conference on Management
- of Data, San Francisco, Ca., May 1987.
- .ip [GUTM84]
- Gutman, A., ``R-trees: A Dynamic Index Structure for Spatial Searching,''
- Proc. 1984 ACM-SIGMOD Conference on Management of Data, Boston, Mass.
- June 1984.
- .ip [KOLE89]
- Kolovson, C. and Stonebraker, M., ``Segmented Search Trees and
- their Application to Data Bases,'' (in preparation).
- .ip [LYNC88]
- Lynch, C. and Stonebraker, M., ``Extended User-Defined Indexing with
- Application to Textual Databases,'' Proc. 1988 VLDB Conference,
- Los Angeles, Ca., Sept. 1988.
- .ip [MAIE89]
- Maier, D., ``Why Isn't There an Object-oriented Data Model?'' Proc.
- 11th IFIP World Congress, San Francisco, Ca., August 1989.
- .ip [OSBO86]
- Osborne, S. and Heaven, T., ``The Design of a Relational System with Abstract
- Data Types as Domains,'' ACM TODS, Sept. 1986.
- .ip [RICH87]
- Richardson, J. and Carey, M., ``Programming Constructs for Database System
- Implementation in EXODUS,'' Proc. 1987 ACM-SIGMOD Conference
- on Management of Data, San Francisco, Ca., May 1987.
- .ip [ROWE87]
- Rowe, L. and Stonebraker, M., "The POSTGRES Data Model,"
- Proc. 1987 VLDB Conference, Brighton, England, Sept 1987.
- .ip [ROWE89]
- Rowe, L. et. al., ``The Design and Implementation
- of Picasso,'' (in preparation).
- .ip [SELL86]
- Sellis, T., ``Global Query Optimization,'' Proc 1986 ACM-SIGMOD Conference on
- Management of Data, Washington, D.C., June 1986.
- .ip [STON75]
- Stonebraker, M., ``Implementation of Integrity Constraints and
- Views by Query Modification,'' Proc. 1975 ACM-SIGMOD Conference,
- San Jose, Ca., May 1975.
- .ip [STON82]
- Stonebraker, M. et. al., ``A Rules System for a Relational Data Base Management
- System,'' Proc. 2nd International Conference on Databases,'' Jerusalem,
- Israel, June 1982 (available from Academic press).
- .ip [STON86]
- Stonebraker, M. and Rowe, L.,
- .q "The Design of POSTGRES,"
- Proc.
- 1986 ACM-SIGMOD Conference, Washington, D.C., June 1986.
- .ip [STON86b]
- Stonebraker, M., \`\`Inclusion of New Types in Relational Data Base Systems,\'\'
- Proc. Second International Conference on Data Engineering, Los Angeles,
- Ca., Feb. 1986.
- .ip [STON87]
- Stonebraker, M., "The POSTGRES Storage System," Proc.
- 1987 VLDB Conference, Brighton, England, Sept. 1987.
- .ip [STON87b]
- Stonebraker, M. et. al., ``Extensibility in POSTGRES,'' IEEE Database
- Engineering, Sept. 1987.
- .ip [STON88]
- Stonebraker, M. et. al., "The POSTGRES Rules System,"
- IEEE Transactions on Software Engineering, July 1988.
- .ip [STON89]
- Stonebraker, M. et. al., ``Commentary on the POSTGRES Rules
- System,'' SIGMOD RECORD, Sept. 1989.
- .ip [STON89b]
- Stonebraker, M. et. al., ``Rules, Procedures and Views,'' (in preparation).
- .ip [ULLM85]
- Ullman, J., ``Implementation of Logical Query Languages for Databases,''
- ACM-TODS, Sept. 1985.
- .ip [VELE89]
- Velez, F. et. al., ``The O2 Object manager: An Overview,'' GIP ALTAIR,
- Le Chesnay, France, Technical Report 27-89, February 1989.
- .ip [WENS88]
- Wensel, S. (ed.),
- .q "The POSTGRES Reference Manual,"
- Electronics Research
- Laboratory, University of California, Berkeley, CA, Report M88/20, March 1988.
- .ip [WIDO89]
- Widom, J. and Finkelstein, S., ``A Syntax and Semantics for Set-oriented
- Production Rules in Relational Data Bases, IBM Research, San Jose, Ca.,
- June 1989.
-
-